Spark: Allow reading timestamp without time zone#48
Spark: Allow reading timestamp without time zone#48wmoustafa merged 4 commits intolinkedin:li-0.10.xfrom
Conversation
88ee1dc to
fdc3ed5
Compare
spark/src/test/java/org/apache/iceberg/spark/source/TestTimestampWithoutZone.java
Outdated
Show resolved
Hide resolved
spark/src/test/java/org/apache/iceberg/spark/source/TestTimestampWithoutZone.java
Show resolved
Hide resolved
spark/src/main/java/org/apache/iceberg/spark/PruneColumnsWithoutReordering.java
Show resolved
Hide resolved
|
I think what we should do is: |
Spark only supports reading timestamp with time zone. However we have a lot of Hive tables which store timestamp without time zone.
In this PR, we modify the Spark code to allow reading timestamp without time zone as timestamp with time zone. Generally, this is not safe as timestamp without time zone is supposed to represent wall clock time semantics, i.e. no matter the reader/writer timezone 3PM should always be read as 3PM, but timestamp with time zone represents instant semantics, i.e the timestamp is adjusted so that the corresponding time in the reader timezone is displayed. However, at LinkedIn, all readers and writers are in the UTC timezone as our production machines are set to UTC. So, timestamp with/without time zone is the same.
We put this feature behind a flag to not do this conversion by default and we will enable this flag at LinkedIn
cc: @HotSushi @wmoustafa